Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 1400 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 65.8 KiB |
| Average record size in memory | 48.1 B |
Variable types
| Numeric | 5 |
|---|---|
| Categorical | 1 |
humidity is
highly correlated with temperature and 4 other
fields |
High correlation |
water availability is
highly correlated with temperature and 3 other fields |
High correlation |
label is highly
correlated with temperature and 4 other
fields |
High correlation |
season is
highly correlated with temperature and 4 other
fields |
High correlation |
temperature is
highly correlated with humidity and 3 other fields
|
High correlation |
ph is highly
correlated with humidity and 2 other fields |
High correlation |
label has 100
(7.1%) zeros |
Zeros |
Reproduction
| Analysis started | 2023-03-03 12:16:08.202051 |
|---|---|
| Analysis finished | 2023-03-03 12:16:15.347807 |
| Duration | 7.15 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 1300 |
|---|---|
| Distinct (%) | 92.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.97162055 |
| Minimum | 15.33042636 |
|---|---|
| Maximum | 36.97794384 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.1 KiB |
Quantile statistics
| Minimum | 15.33042636 |
|---|---|
| 5-th percentile | 18.25405352 |
| Q1 | 22.17823907 |
| median | 25.14024451 |
| Q3 | 27.96322684 |
| 95-th percentile | 31.22032229 |
| Maximum | 36.97794384 |
| Range | 21.64751748 |
| Interquartile range (IQR) | 5.784987767 |
Descriptive statistics
| Standard deviation | 4.081622446 |
|---|---|
| Coefficient of variation (CV) | 0.1634504432 |
| Kurtosis | -0.3263094809 |
| Mean | 24.97162055 |
| Median Absolute Deviation (MAD) | 2.865742815 |
| Skewness | 0.01907108973 |
| Sum | 34960.26877 |
| Variance | 16.65964179 |
| Monotonicity | Not monotonic |
Histogram with fixed size
bins (bins=50)
| Value | Count | Frequency (%) |
| 25.33797709 | 2 | 0.1% |
| 21.869274 | 2 | 0.1% |
| 23.39128187 | 2 | 0.1% |
| 18.41932981 | 2 | 0.1% |
| 20.27317074 | 2 | 0.1% |
| 24.71417533 | 2 | 0.1% |
| 22.61359953 | 2 | 0.1% |
| 26.10018422 | 2 | 0.1% |
| 23.55882094 | 2 | 0.1% |
| 19.97215954 | 2 | 0.1% |
| Other values (1290) | 1380 |
| Value | Count | Frequency (%) |
| 15.33042636 | 1 | |
| 15.43546065 | 1 | |
| 15.46789263 | 1 | |
| 15.53834801 | 1 | |
| 15.77370214 | 1 | |
| 15.78601387 | 1 | |
| 16.03768615 | 1 | |
| 16.06522754 | 1 | |
| 16.24469193 | 1 | |
| 16.43340342 | 1 |
| Value | Count | Frequency (%) |
| 36.97794384 | 1 | |
| 36.89163721 | 1 | |
| 36.75087487 | 1 | |
| 36.51268371 | 1 | |
| 36.30049702 | 1 | |
| 36.20970524 | 1 | |
| 36.04353699 | 1 | |
| 36.00415838 | 1 | |
| 35.95176642 | 1 | |
| 35.45790488 | 1 |
| Distinct | 1300 |
|---|---|
| Distinct (%) | 92.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 64.61106202 |
| Minimum | 14.25803981 |
|---|---|
| Maximum | 94.96218673 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.1 KiB |
Quantile statistics
| Minimum | 14.25803981 |
|---|---|
| 5-th percentile | 17.92929023 |
| Q1 | 56.82421692 |
| median | 68.28832112 |
| Q3 | 82.71040882 |
| 95-th percentile | 91.21936824 |
| Maximum | 94.96218673 |
| Range | 80.70414692 |
| Interquartile range (IQR) | 25.88619191 |
Descriptive statistics
| Standard deviation | 22.75378493 |
|---|---|
| Coefficient of variation (CV) | 0.3521654686 |
| Kurtosis | -0.2400944603 |
| Mean | 64.61106202 |
| Median Absolute Deviation (MAD) | 13.76174229 |
| Skewness | -0.8991506833 |
| Sum | 90455.48682 |
| Variance | 517.7347287 |
| Monotonicity | Not monotonic |
Histogram with fixed size
bins (bins=50)
| Value | Count | Frequency (%) |
| 68.49835977 | 2 | 0.1% |
| 61.91044947 | 2 | 0.1% |
| 61.74427165 | 2 | 0.1% |
| 64.23580251 | 2 | 0.1% |
| 63.91281869 | 2 | 0.1% |
| 56.73426469 | 2 | 0.1% |
| 63.69070564 | 2 | 0.1% |
| 71.57476937 | 2 | 0.1% |
| 71.59351368 | 2 | 0.1% |
| 57.68272924 | 2 | 0.1% |
| Other values (1290) | 1380 |
| Value | Count | Frequency (%) |
| 14.25803981 | 1 | |
| 14.27327988 | 1 | |
| 14.2804191 | 1 | |
| 14.32313811 | 1 | |
| 14.33847406 | 1 | |
| 14.42457525 | 1 | |
| 14.44008871 | 1 | |
| 14.44228303 | 1 | |
| 14.69765308 | 1 | |
| 14.70085967 | 1 |
| Value | Count | Frequency (%) |
| 94.96218673 | 1 | |
| 94.87679041 | 1 | |
| 94.86907886 | 1 | |
| 94.81637388 | 1 | |
| 94.79453182 | 1 | |
| 94.78993038 | 1 | |
| 94.72981338 | 1 | |
| 94.65343534 | 1 | |
| 94.57459443 | 1 | |
| 94.55695552 | 1 |
| Distinct | 1300 |
|---|---|
| Distinct (%) | 92.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.565245964 |
| Minimum | 3.504752314 |
|---|---|
| Maximum | 9.93509073 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.1 KiB |
Quantile statistics
| Minimum | 3.504752314 |
|---|---|
| 5-th percentile | 5.386097545 |
| Q1 | 6.06879526 |
| median | 6.524478032 |
| Q3 | 7.042342972 |
| 95-th percentile | 7.881354651 |
| Maximum | 9.93509073 |
| Range | 6.430338416 |
| Interquartile range (IQR) | 0.9735477118 |
Descriptive statistics
| Standard deviation | 0.8351014206 |
|---|---|
| Coefficient of variation (CV) | 0.127200325 |
| Kurtosis | 1.662209163 |
| Mean | 6.565245964 |
| Median Absolute Deviation (MAD) | 0.4874527745 |
| Skewness | 0.1729917217 |
| Sum | 9191.34435 |
| Variance | 0.6973943826 |
| Monotonicity | Not monotonic |
Histogram with fixed size
bins (bins=50)
| Value | Count | Frequency (%) |
| 6.586244581 | 2 | 0.1% |
| 5.850439831 | 2 | 0.1% |
| 5.871647806 | 2 | 0.1% |
| 6.474476516 | 2 | 0.1% |
| 6.439071996 | 2 | 0.1% |
| 6.648725327 | 2 | 0.1% |
| 5.749914421 | 2 | 0.1% |
| 6.931756558 | 2 | 0.1% |
| 6.657964753 | 2 | 0.1% |
| 6.596060648 | 2 | 0.1% |
| Other values (1290) | 1380 |
| Value | Count | Frequency (%) |
| 3.504752314 | 1 | |
| 3.510404312 | 1 | |
| 3.5253661 | 1 | |
| 3.532008668 | 1 | |
| 3.558822825 | 1 | |
| 3.692863601 | 1 | |
| 3.71105919 | 1 | |
| 3.793575185 | 1 | |
| 3.808429173 | 1 | |
| 3.828031463 | 1 |
| Value | Count | Frequency (%) |
| 9.93509073 | 1 | |
| 9.926212291 | 1 | |
| 9.679240873 | 1 | |
| 9.45949344 | 1 | |
| 9.416003106 | 1 | |
| 9.406887533 | 1 | |
| 9.392694614 | 1 | |
| 9.254089438 | 1 | |
| 9.160691747 | 1 | |
| 9.112771682 | 1 |
| Distinct | 1300 |
|---|---|
| Distinct (%) | 92.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 91.7846509 |
| Minimum | 20.21126747 |
|---|---|
| Maximum | 298.5601175 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.1 KiB |
Quantile statistics
| Minimum | 20.21126747 |
|---|---|
| 5-th percentile | 26.51740556 |
| Q1 | 51.54654173 |
| median | 72.37918315 |
| Q3 | 107.4283336 |
| 95-th percentile | 209.9458982 |
| Maximum | 298.5601175 |
| Range | 278.34885 |
| Interquartile range (IQR) | 55.88179184 |
Descriptive statistics
| Standard deviation | 58.68225767 |
|---|---|
| Coefficient of variation (CV) | 0.6393471795 |
| Kurtosis | 1.312178796 |
| Mean | 91.7846509 |
| Median Absolute Deviation (MAD) | 24.40939669 |
| Skewness | 1.364459853 |
| Sum | 128498.5113 |
| Variance | 3443.607365 |
| Monotonicity | Not monotonic |
Histogram with fixed size
bins (bins=50)
| Value | Count | Frequency (%) |
| 96.46380213 | 2 | 0.1% |
| 107.2681929 | 2 | 0.1% |
| 107.3198135 | 2 | 0.1% |
| 76.41312437 | 2 | 0.1% |
| 62.50351892 | 2 | 0.1% |
| 88.45361858 | 2 | 0.1% |
| 87.75953857 | 2 | 0.1% |
| 102.2662445 | 2 | 0.1% |
| 66.71995467 | 2 | 0.1% |
| 60.65171481 | 2 | 0.1% |
| Other values (1290) | 1380 |
| Value | Count | Frequency (%) |
| 20.21126747 | 1 | |
| 20.36001144 | 1 | |
| 20.39020503 | 1 | |
| 20.49035619 | 1 | |
| 20.66127836 | 1 | |
| 20.76212031 | 1 | |
| 20.76223014 | 1 | |
| 20.76582087 | 1 | |
| 20.88620369 | 1 | |
| 21.0000988 | 1 |
| Value | Count | Frequency (%) |
| 298.5601175 | 1 | |
| 298.4018471 | 1 | |
| 295.9248796 | 1 | |
| 295.6094492 | 1 | |
| 291.2986618 | 1 | |
| 290.6793783 | 1 | |
| 287.5766935 | 1 | |
| 286.5083725 | 1 | |
| 285.2493645 | 1 | |
| 284.4364567 | 1 |
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 11.1 KiB |
| 0 | |
|---|---|
| 1 | |
| 3 | |
| 2 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1400 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard
assigns character properties to each code point, which can be used
to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 600 | |
| 1 | 400 | |
| 3 | 300 | |
| 2 | 100 | 7.1% |
Length
Histogram of lengths of the
category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 600 | |
| 1 | 400 | |
| 3 | 300 | |
| 2 | 100 | 7.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 600 | |
| 1 | 400 | |
| 3 | 300 | |
| 2 | 100 | 7.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1400 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 600 | |
| 1 | 400 | |
| 3 | 300 | |
| 2 | 100 | 7.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1400 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 600 | |
| 1 | 400 | |
| 3 | 300 | |
| 2 | 100 | 7.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1400 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 600 | |
| 1 | 400 | |
| 3 | 300 | |
| 2 | 100 | 7.1% |
| Distinct | 13 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.642857143 |
| Minimum | 0 |
|---|---|
| Maximum | 12 |
| Zeros | 100 |
| Zeros (%) | 7.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 11.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 5.5 |
| Q3 | 9 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 12 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 3.82996617 |
|---|---|
| Coefficient of variation (CV) | 0.6787281821 |
| Kurtosis | -1.286860175 |
| Mean | 5.642857143 |
| Median Absolute Deviation (MAD) | 3.5 |
| Skewness | 0.1216937038 |
| Sum | 7900 |
| Variance | 14.66864087 |
| Monotonicity | Not monotonic |
Histogram with fixed size
bins (bins=13)
| Value | Count | Frequency (%) |
| 1 | 200 | |
| 0 | 100 | 7.1% |
| 2 | 100 | 7.1% |
| 3 | 100 | 7.1% |
| 4 | 100 | 7.1% |
| 5 | 100 | 7.1% |
| 6 | 100 | 7.1% |
| 7 | 100 | 7.1% |
| 8 | 100 | 7.1% |
| 9 | 100 | 7.1% |
| Other values (3) | 300 |
| Value | Count | Frequency (%) |
| 0 | 100 | |
| 1 | 200 | |
| 2 | 100 | |
| 3 | 100 | |
| 4 | 100 | |
| 5 | 100 | |
| 6 | 100 | |
| 7 | 100 | |
| 8 | 100 | |
| 9 | 100 |
| Value | Count | Frequency (%) |
| 12 | 100 | |
| 11 | 100 | |
| 10 | 100 | |
| 9 | 100 | |
| 8 | 100 | |
| 7 | 100 | |
| 6 | 100 | |
| 5 | 100 | |
| 4 | 100 | |
| 3 | 100 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which
lets you quickly visually pick out patterns in data completion.
First rows
| temperature | humidity | ph | water availability | season | label | |
|---|---|---|---|---|---|---|
| 0 | 20.879744 | 82.002744 | 6.502985 | 202.935536 | 0 | 0 |
| 1 | 21.770462 | 80.319644 | 7.038096 | 226.655537 | 0 | 0 |
| 2 | 23.004459 | 82.320763 | 7.840207 | 263.964248 | 0 | 0 |
| 3 | 26.491096 | 80.158363 | 6.980401 | 242.864034 | 0 | 0 |
| 4 | 20.130175 | 81.604873 | 7.628473 | 262.717340 | 0 | 0 |
| 5 | 23.058049 | 83.370118 | 7.073454 | 251.055000 | 0 | 0 |
| 6 | 22.708838 | 82.639414 | 5.700806 | 271.324860 | 0 | 0 |
| 7 | 20.277744 | 82.894086 | 5.718627 | 241.974195 | 0 | 0 |
| 8 | 24.515881 | 83.535216 | 6.685346 | 230.446236 | 0 | 0 |
| 9 | 23.223974 | 83.033227 | 6.336254 | 221.209196 | 0 | 0 |
Last rows
| temperature | humidity | ph | water availability | season | label | |
|---|---|---|---|---|---|---|
| 1390 | 23.787560 | 74.367941 | 6.014572 | 172.644265 | 0 | 12 |
| 1391 | 25.499417 | 75.999876 | 6.663559 | 193.714183 | 0 | 12 |
| 1392 | 23.249256 | 73.653468 | 6.434611 | 184.767486 | 0 | 12 |
| 1393 | 26.985822 | 89.055879 | 7.432768 | 193.877871 | 0 | 12 |
| 1394 | 23.614753 | 86.142903 | 6.987333 | 150.235524 | 0 | 12 |
| 1395 | 23.874845 | 86.792613 | 6.718725 | 177.514731 | 0 | 12 |
| 1396 | 23.928879 | 88.071123 | 6.880205 | 154.660874 | 0 | 12 |
| 1397 | 24.814412 | 81.686889 | 6.861069 | 190.788639 | 0 | 12 |
| 1398 | 24.447439 | 82.286484 | 6.769346 | 190.968489 | 0 | 12 |
| 1399 | 26.574217 | 73.819949 | 7.261581 | 159.322307 | 0 | 12 |